Mielke Distribution (mielke, Mielke Beta-Kappa / Dagum)#
The Mielke Beta-Kappa distribution (often called the Dagum distribution) is a flexible family on positive real values with polynomial (heavy) tails.
It is a good default when:
your data are strictly positive (precipitation amounts, income/wealth, claim sizes, waiting times)
you need a model that can be very right-skewed with power-law tails
you want simple sampling (closed-form inverse CDF)
Learning goals#
By the end you should be able to:
write the PDF/CDF/quantile function and interpret the parameters
compute moments (and know when they do not exist)
derive expectation/variance and the likelihood
sample from
mielkeusing inverse-transform sampling (NumPy-only)fit and use the distribution via
scipy.stats.mielke
Notation#
Shape parameters: \(k > 0\), \(s > 0\)
Random variable: \(X \sim \mathrm{Mielke}(k, s)\) (standard form)
Standard support: \(x > 0\)
Table of contents#
Title & Classification
Intuition & Motivation
Formal Definition
Moments & Properties
Parameter Interpretation
Derivations
Sampling & Simulation
Visualization
SciPy Integration
Statistical Use Cases
Pitfalls
Summary
import numpy as np
import plotly
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
import scipy
from scipy import stats
from scipy.special import gammaln, psi, logsumexp
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=6, suppress=True)
rng = np.random.default_rng(42)
print("numpy:", np.__version__)
print("scipy:", scipy.__version__)
print("plotly:", plotly.__version__)
numpy: 1.26.2
scipy: 1.15.0
plotly: 6.5.2
1) Title & Classification#
Name:
mielke(Mielke Beta-Kappa; also known as the Dagum distribution)Type: continuous
Standard support: \(x > 0\) (SciPy defines the PDF for \(x \ge 0\); if \(k<1\) the density diverges as \(x\to 0^+\))
Parameter space (standard form): \(k > 0\), \(s > 0\)
Location/scale form (SciPy): \(X = \mathrm{loc} + \mathrm{scale}\cdot Y\) with \(Y \sim \mathrm{Mielke}(k, s)\)
Support becomes \(x > \mathrm{loc}\)
\(\mathrm{scale} > 0\)
2) Intuition & Motivation#
What it models#
mielke is a positive, right-skewed, heavy-tailed model. It is useful when:
small values occur frequently (controlled by \(k\))
rare, extremely large values occur (tail heaviness controlled by \(s\))
a lognormal/gamma tail is too light
The survival function decays like a power law:
So \(s\) behaves like a tail index.
Typical real-world use cases#
Precipitation / hydrology (Mielke introduced this family for rainfall amounts)
Income/wealth modeling (Dagum distribution in economics)
Insurance claim sizes and other positive heavy-tailed costs
Reliability / lifetime modeling when failures can occur very late
Relations to other distributions#
Burr Type III / Dagum:
mielke(k, s)is exactly SciPy’sburr(c=s, d=k/s).Log-logistic (
fisk): the constraint \(k=s\) gives $\(f(x)=\frac{s\,x^{s-1}}{(1+x^s)^2},\)$ which is the log-logistic (fisk) density.Beta-prime connection: if \(Y = X^s\), then $\(f_Y(y)=\frac{k}{s}\,\frac{y^{k/s-1}}{(1+y)^{1+k/s}},\quad y>0,\)\( so \)Y\( is **Beta-prime** with parameters \)(k/s,,1)$.
Beta connection: if \(U = \frac{X^s}{1+X^s}\), then \(U\in(0,1)\) and $\(U \sim \mathrm{Beta}(k/s, 1).\)$ This gives an immediate sampler.
3) Formal Definition#
CDF#
In the standard (unshifted, unit-scale) parameterization, for \(x>0\):
PDF#
Differentiating the CDF gives the density:
Quantile function (inverse CDF)#
Let \(p\in(0,1)\). Solving \(p = (1 + x^{-s})^{-k/s}\) gives:
Location/scale#
SciPy uses the standard location/scale convention:
Then \(X\) has support \(x>\mathrm{loc}\) and
def mielke_logpdf(x, k, s):
'''Log-PDF of the standard Mielke (Beta-Kappa / Dagum) distribution (loc=0, scale=1).'''
x = np.asarray(x, dtype=float)
k = float(k)
s = float(s)
out = np.full_like(x, -np.inf, dtype=float)
if (k <= 0) or (s <= 0):
return out
mask = x > 0
xm = x[mask]
logx = np.log(xm)
# log(1 + x^s) computed stably as log(1 + exp(s log x))
log1p_xs = np.logaddexp(0.0, s * logx)
out[mask] = np.log(k) + (k - 1.0) * logx - (1.0 + k / s) * log1p_xs
return out
def mielke_pdf(x, k, s):
return np.exp(mielke_logpdf(x, k, s))
def mielke_logcdf(x, k, s):
'''Log-CDF of the standard Mielke distribution.'''
x = np.asarray(x, dtype=float)
k = float(k)
s = float(s)
out = np.full_like(x, -np.inf, dtype=float)
if (k <= 0) or (s <= 0):
return out
mask = x > 0
xm = x[mask]
logx = np.log(xm)
# Use F(x) = (1 + x^{-s})^{-k/s} to avoid cancellation when x is large.
log1p_xnegs = np.logaddexp(0.0, -s * logx) # log(1 + x^{-s})
out[mask] = -(k / s) * log1p_xnegs
return out
def mielke_cdf(x, k, s):
x = np.asarray(x, dtype=float)
out = np.zeros_like(x, dtype=float)
mask = x > 0
out[mask] = np.exp(mielke_logcdf(x[mask], k, s))
return out
def mielke_ppf(p, k, s):
'''Quantile function Q(p) for p in [0,1].'''
p = np.asarray(p, dtype=float)
k = float(k)
s = float(s)
x = np.full_like(p, np.nan, dtype=float)
if (k <= 0) or (s <= 0):
return x
x[p == 0] = 0.0
x[p == 1] = np.inf
mask = (p > 0) & (p < 1)
# Let a = (s/k) log p, so q = p^{s/k} = exp(a) in (0,1).
a = (s / k) * np.log(p[mask])
# q/(1-q) = exp(a) / (1-exp(a)) = exp(a) / (-expm1(a)) (stable when a≈0).
log_ratio = a - np.log(-np.expm1(a))
x[mask] = np.exp(log_ratio / s)
return x
def mielke_rvs_numpy(k, s, size, rng=None):
'''NumPy-only sampler via inverse-transform sampling.'''
rng = np.random.default_rng() if rng is None else rng
u = rng.random(size)
return mielke_ppf(u, k, s)
def mielke_raw_moment(n, k, s):
'''Raw moment E[X^n] for -k < n < s; returns +inf when the moment diverges.'''
k = float(k)
s = float(s)
n = float(n)
if (k <= 0) or (s <= 0):
return np.nan
# Integrability:
# - near 0: f(x) ~ k x^{k-1} so E[X^n] finite iff n > -k
# - in the tail: f(x) ~ k x^{-s-1} so E[X^n] finite iff n < s
if not (-k < n < s):
return np.inf
return np.exp(gammaln((k + n) / s) + gammaln(1.0 - n / s) - gammaln(k / s))
def mielke_entropy(k, s):
'''Differential entropy of the standard Mielke distribution.'''
k = float(k)
s = float(s)
if (k <= 0) or (s <= 0):
return np.nan
D = psi(1.0 + k / s) - psi(1.0)
return -np.log(k) + (k - 1.0) / k + (1.0 + 1.0 / s) * D
def mielke_summary_stats(k, s):
'''Mean/variance/skewness/excess kurtosis (when finite), else nan/inf.'''
k = float(k)
s = float(s)
mean = mielke_raw_moment(1.0, k, s) if s > 1 else np.inf
if s <= 2:
return mean, np.inf, np.nan, np.nan
m2 = mielke_raw_moment(2.0, k, s)
var = m2 - mean**2
skew = np.nan
exkurt = np.nan
if s > 3:
m3 = mielke_raw_moment(3.0, k, s)
mu3 = m3 - 3.0 * m2 * mean + 2.0 * mean**3
skew = mu3 / (var ** 1.5)
if s > 4:
m3 = mielke_raw_moment(3.0, k, s) # defined since s>4
m4 = mielke_raw_moment(4.0, k, s)
mu4 = m4 - 4.0 * m3 * mean + 6.0 * m2 * mean**2 - 3.0 * mean**4
exkurt = mu4 / (var**2) - 3.0
return mean, var, skew, exkurt
4) Moments & Properties#
Existence of moments (key takeaway)#
The right tail is polynomial:
So positive moments satisfy:
\(\mathbb{E}[X^n] < \infty\) iff \(n < s\).
Near 0, \(f(x)\approx k x^{k-1}\), so negative moments exist iff \(n > -k\).
Overall:
Raw moments#
For \(-k < n < s\):
Mean and variance#
Mean (exists for \(s>1\)): $\(\mathbb{E}[X] = \frac{\Gamma\left(\tfrac{k+1}{s}\right)\,\Gamma\left(1-\tfrac{1}{s}\right)}{\Gamma\left(\tfrac{k}{s}\right)}\)$
Second moment exists for \(s>2\): $\(\mathbb{E}[X^2] = \frac{\Gamma\left(\tfrac{k+2}{s}\right)\,\Gamma\left(1-\tfrac{2}{s}\right)}{\Gamma\left(\tfrac{k}{s}\right)}\)$
Variance (exists for \(s>2\)): $\(\mathrm{Var}(X)=\mathbb{E}[X^2] - \mathbb{E}[X]^2\)$
Skewness and kurtosis#
Skewness exists for \(s>3\)
Excess kurtosis exists for \(s>4\)
You can compute them from raw moments \(m_n = \mathbb{E}[X^n]\) via standard formulas.
MGF / characteristic function#
The MGF \(M(t)=\mathbb{E}[e^{tX}]\) diverges for any \(t>0\) because the tail is polynomial.
The characteristic function \(\varphi(t)=\mathbb{E}[e^{itX}]\) exists for all real \(t\), but does not simplify to elementary functions in general.
Entropy#
The differential entropy has a closed form in terms of the digamma function \(\psi\):
k0, s0 = 2.5, 1.7
x_test = np.array([0.2, 0.5, 1.0, 2.0, 5.0])
pdf_np = mielke_pdf(x_test, k0, s0)
pdf_sp = stats.mielke.pdf(x_test, k0, s0)
print("max |pdf_numpy - pdf_scipy|:", np.max(np.abs(pdf_np - pdf_sp)))
# Relationship to Burr III / Dagum in SciPy: burr(c=s, d=k/s)
pdf_burr = stats.burr.pdf(x_test, s0, k0 / s0)
print("max |mielke - burr(c=s, d=k/s)|:", np.max(np.abs(pdf_sp - pdf_burr)))
mean, var, skew, exkurt = mielke_summary_stats(k0, s0)
mean_sp, var_sp, skew_sp, exkurt_sp = stats.mielke.stats(k0, s0, moments="mvsk")
print("mean:", mean, "(scipy:", float(mean_sp), ")")
print("var:", var, "(scipy:", float(var_sp), ")")
print("skew:", skew, "(scipy:", float(skew_sp), ")")
print("excess kurtosis:", exkurt, "(scipy:", float(exkurt_sp), ")")
h = mielke_entropy(k0, s0)
print("entropy:", h, "(scipy:", float(stats.mielke.entropy(k0, s0)), ")")
max |pdf_numpy - pdf_scipy|: 5.551115123125783e-17
max |mielke - burr(c=s, d=k/s)|: 1.6653345369377348e-16
mean: 2.495423340949614 (scipy: 2.4954233409496127 )
var: inf (scipy: inf )
skew: nan (scipy: nan )
excess kurtosis: nan (scipy: nan )
entropy: 1.6941719835054267 (scipy: 1.6941719835053937 )
5) Parameter Interpretation#
The parameters \(k\) and \(s\) both affect shape, but in different ways.
\(s\) (tail index)#
From the PDF, as \(x\to\infty\):
So:
larger \(s\) means a lighter right tail
the \(n\)-th moment exists iff \(n < s\) (mean requires \(s>1\), variance requires \(s>2\))
\(k\) (behavior near 0)#
As \(x\to 0^+\), \((1+x^s)^{-(1+k/s)}\to 1\), so
If \(k<1\), the density diverges at 0.
If \(k=1\), the density is finite and nonzero at 0.
If \(k>1\), the density goes to 0 at 0.
A convenient derived quantity is \(k/s\) (it appears in the CDF and quantiles). For example,
x = np.logspace(-3, 3, 900)
# Effect of changing k (near-zero behavior)
s_fixed = 2.5
k_list = [0.5, 1.0, 3.0]
fig_pdf_k = go.Figure()
for k in k_list:
y = np.maximum(mielke_pdf(x, k, s_fixed), 1e-300)
fig_pdf_k.add_trace(go.Scatter(x=x, y=y, mode="lines", name=f"k={k}, s={s_fixed}"))
fig_pdf_k.update_layout(title="PDF shape when varying k (s fixed)")
fig_pdf_k.update_xaxes(type="log", title="x")
fig_pdf_k.update_yaxes(type="log", title="pdf(x)")
fig_pdf_k.show()
# Effect of changing s (tail heaviness)
k_fixed = 2.5
s_list = [0.9, 2.0, 5.0]
fig_pdf_s = go.Figure()
for s in s_list:
y = np.maximum(mielke_pdf(x, k_fixed, s), 1e-300)
fig_pdf_s.add_trace(go.Scatter(x=x, y=y, mode="lines", name=f"k={k_fixed}, s={s}"))
fig_pdf_s.update_layout(title="PDF shape when varying s (k fixed)")
fig_pdf_s.update_xaxes(type="log", title="x")
fig_pdf_s.update_yaxes(type="log", title="pdf(x)")
fig_pdf_s.show()
6) Derivations#
Expectation (raw moments)#
Start from the raw moment definition (standard form):
Combine powers and substitute \(y=x^s\):
\(x=y^{1/s}\)
\(dx = \frac{1}{s}y^{1/s-1}\,dy\)
Then
Recognize the Beta-function identity
Here \(a=(k+n)/s\) and \(b=1-n/s\). This requires \(a>0\) (i.e. \(n>-k\)) and \(b>0\) (i.e. \(n<s\)). Plugging in and using \(\Gamma(1+k/s)=(k/s)\Gamma(k/s)\) yields:
Variance#
When \(s>2\):
where \(\mathbb{E}[X]\) and \(\mathbb{E}[X^2]\) use the raw-moment formula above.
Likelihood#
Given i.i.d. data \(x_1,\dots,x_n\) with \(x_i>0\), the log-likelihood (standard form) is:
There is no closed-form MLE; in practice you maximize \(\ell(k,s)\) numerically (SciPy’s fit does this for you).
def mielke_loglik(k, s, x):
x = np.asarray(x, dtype=float)
if (k <= 0) or (s <= 0) or np.any(x <= 0):
return -np.inf
return float(np.sum(mielke_logpdf(x, k, s)))
# quick sanity check: log-likelihood is higher near the true parameters (on average)
k_true, s_true = 2.5, 3.0
x_data = mielke_rvs_numpy(k_true, s_true, size=2500, rng=rng)
for (k_try, s_try) in [(1.8, 3.0), (2.5, 3.0), (3.2, 3.0), (2.5, 2.0), (2.5, 4.0)]:
print((k_try, s_try), mielke_loglik(k_try, s_try, x_data))
(1.8, 3.0) -2317.0192153765065
(2.5, 3.0) -2192.250620713471
(3.2, 3.0) -2271.5919986517115
(2.5, 2.0) -2430.591198727745
(2.5, 4.0) -2332.061147185981
7) Sampling & Simulation (NumPy-only)#
Because the CDF is available in closed form, sampling is straightforward via inverse-transform sampling.
Algorithm#
Draw \(U\sim\mathrm{Uniform}(0,1)\).
Return \(X = Q(U)\) where $\( Q(p) = \left(\frac{p^{s/k}}{1 - p^{s/k}}\right)^{1/s}. \)$
Equivalently (via the Beta connection):
draw \(V = U^{s/k}\) so that \(V\sim\mathrm{Beta}(k/s,1)\)
set \(X = \left(\frac{V}{1-V}\right)^{1/s}\).
The implementation above (mielke_rvs_numpy) uses the quantile function.
k_samp, s_samp = 2.5, 3.0
samples = mielke_rvs_numpy(k_samp, s_samp, size=50_000, rng=rng)
qs = np.array([0.1, 0.5, 0.9, 0.99])
q_emp = np.quantile(samples, qs)
q_theory = mielke_ppf(qs, k_samp, s_samp)
print("Quantiles p:", qs)
print("Empirical:", q_emp)
print("Theory:", q_theory)
print("\nSample mean/var (finite here since s=3>2):")
mean_theory = mielke_raw_moment(1, k_samp, s_samp)
var_theory = mielke_raw_moment(2, k_samp, s_samp) - mean_theory**2
print("mean:", samples.mean(), "(theory:", mean_theory, ")")
print("var:", samples.var(), "(theory:", var_theory, ")")
Quantiles p: [0.1 0.5 0.9 0.99]
Empirical: [0.409355 0.921199 1.945194 4.387239]
Theory: [0.406851 0.916873 1.950439 4.351833]
Sample mean/var (finite here since s=3>2):
mean: 1.1137956021290585 (theory: 1.1129126745223055 )
var: 0.8749178323085646 (theory: 0.8646985368757909 )
8) Visualization#
We’ll visualize:
the theoretical PDF and CDF
Monte Carlo samples (histogram + PDF overlay)
empirical CDF vs theoretical CDF
k_vis, s_vis = 2.5, 3.0
x_grid = np.logspace(-3, 3, 900)
# PDF
fig_pdf = go.Figure()
fig_pdf.add_trace(
go.Scatter(
x=x_grid,
y=np.maximum(mielke_pdf(x_grid, k_vis, s_vis), 1e-300),
mode="lines",
name="pdf",
)
)
fig_pdf.update_layout(title=f"Mielke PDF (k={k_vis}, s={s_vis})")
fig_pdf.update_xaxes(type="log", title="x")
fig_pdf.update_yaxes(type="log", title="pdf(x)")
fig_pdf.show()
# CDF
fig_cdf = go.Figure()
fig_cdf.add_trace(go.Scatter(x=x_grid, y=mielke_cdf(x_grid, k_vis, s_vis), mode="lines", name="cdf"))
fig_cdf.update_layout(title=f"Mielke CDF (k={k_vis}, s={s_vis})")
fig_cdf.update_xaxes(type="log", title="x")
fig_cdf.update_yaxes(title="cdf(x)")
fig_cdf.show()
# Monte Carlo samples: histogram + PDF overlay
samples_vis = mielke_rvs_numpy(k_vis, s_vis, size=30_000, rng=rng)
fig_hist = px.histogram(
samples_vis,
nbins=80,
histnorm="probability density",
log_x=True,
opacity=0.55,
title=f"Monte Carlo histogram vs PDF (k={k_vis}, s={s_vis})",
)
fig_hist.add_trace(go.Scatter(x=x_grid, y=mielke_pdf(x_grid, k_vis, s_vis), mode="lines", name="pdf"))
fig_hist.update_xaxes(title="x")
fig_hist.update_yaxes(title="density")
fig_hist.show()
# Empirical CDF vs theoretical CDF
x_sorted = np.sort(samples_vis)
ecdf = np.arange(1, len(x_sorted) + 1) / len(x_sorted)
fig_ecdf = go.Figure()
fig_ecdf.add_trace(go.Scatter(x=x_sorted, y=ecdf, mode="lines", name="empirical CDF"))
fig_ecdf.add_trace(go.Scatter(x=x_grid, y=mielke_cdf(x_grid, k_vis, s_vis), mode="lines", name="theoretical CDF"))
fig_ecdf.update_layout(title="Empirical vs theoretical CDF")
fig_ecdf.update_xaxes(type="log", title="x")
fig_ecdf.update_yaxes(title="CDF")
fig_ecdf.show()
9) SciPy Integration (scipy.stats.mielke)#
SciPy provides a full implementation:
stats.mielke.pdf,logpdfstats.mielke.cdf,ppfstats.mielke.rvsstats.mielke.fit(MLE)
Remember the Burr relation:
stats.mielke(k, s)is equivalent tostats.burr(c=s, d=k/s).
k_true, s_true = 2.5, 3.0
dist = stats.mielke(k_true, s_true) # loc=0, scale=1 by default
x_eval = np.array([0.5, 1.0, 2.0, 5.0])
print("pdf:", dist.pdf(x_eval))
print("cdf:", dist.cdf(x_eval))
# rvs
data = dist.rvs(size=3000, random_state=rng)
print("sample min/max:", data.min(), data.max())
# fit (fix loc=0, scale=1 to estimate only k and s)
k_hat, s_hat, loc_hat, scale_hat = stats.mielke.fit(data, floc=0, fscale=1)
print("\nTrue (k,s):", (k_true, s_true))
print("Fit (k,s):", (k_hat, s_hat))
print("Returned loc/scale:", (loc_hat, scale_hat))
# Compare NumPy vs SciPy implementations numerically
x_dense = np.logspace(-3, 3, 1000)
max_pdf_diff = np.max(np.abs(mielke_pdf(x_dense, k_true, s_true) - dist.pdf(x_dense)))
max_cdf_diff = np.max(np.abs(mielke_cdf(x_dense, k_true, s_true) - dist.cdf(x_dense)))
print("\nmax |pdf_numpy - pdf_scipy|:", max_pdf_diff)
print("max |cdf_numpy - cdf_scipy|:", max_cdf_diff)
pdf: [0.712222 0.701539 0.125904 0.003942]
cdf: [0.16025 0.561231 0.906511 0.993382]
sample min/max: 0.025821632419133145 10.725107179530955
True (k,s): (2.5, 3.0)
Fit (k,s): (2.4983676379870374, 3.0199409878085794)
Returned loc/scale: (0, 1)
max |pdf_numpy - pdf_scipy|: 3.3306690738754696e-16
max |cdf_numpy - cdf_scipy|: 9.992007221626409e-16
10) Statistical Use Cases#
Hypothesis testing#
Nested model test: the case \(k=s\) is the log-logistic distribution (
fisk). You can test $\(H_0: k=s\ \text{(log-logistic)}\quad\text{vs}\quad H_1: (k,s)\ \text{free}\)$ using a likelihood-ratio test (LRT).Goodness-of-fit: QQ-plots or distribution tests (KS/AD) can be used as diagnostics. Be careful: classical p-values assume parameters are known, while in practice they’re often estimated.
Bayesian modeling#
For positive heavy-tailed data, you can use mielke as a likelihood and place priors on \((k,s)\) (e.g. log-normal or log-uniform). There is no conjugate prior, but posterior inference is straightforward with MCMC or a simple grid approximation in low dimensions.
Generative modeling#
Useful as a base distribution for positive heavy-tailed generative models.
Can be used in mixtures to model multimodal positive data.
The Beta/Beta-prime transformations make it convenient inside larger hierarchical models.
# Likelihood-ratio test example: H0 is log-logistic (fisk), H1 is mielke
c0 = 2.5
x = stats.fisk.rvs(c0, size=1500, random_state=rng)
# Fit under H1 (free k,s) and H0 (fisk shape c). Fix loc=0, scale=1 for simplicity.
k_hat1, s_hat1, _, _ = stats.mielke.fit(x, floc=0, fscale=1)
c_hat0, _, _ = stats.fisk.fit(x, floc=0, fscale=1)
ll1 = np.sum(stats.mielke.logpdf(x, k_hat1, s_hat1))
ll0 = np.sum(stats.fisk.logpdf(x, c_hat0))
lrt_stat = 2 * (ll1 - ll0)
p_value = stats.chi2.sf(lrt_stat, df=1)
print("True H0 (fisk c):", c0)
print("Fit H1 (k,s):", (k_hat1, s_hat1))
print("Fit H0 (fisk c):", c_hat0)
print("LRT stat:", float(lrt_stat))
print("Approx p-value (chi^2_1):", float(p_value))
True H0 (fisk c): 2.5
Fit H1 (k,s): (2.525228570404704, 2.5103231593958144)
Fit H0 (fisk c): 2.5155273437500036
LRT stat: 0.043273791422052454
Approx p-value (chi^2_1): 0.8352105904552609
# Simple Bayesian grid posterior over (k,s) with a log-uniform prior p(k,s) ∝ 1/(k s)
# This is an approximation for intuition (not a replacement for MCMC for serious work).
k_true, s_true = 2.5, 3.0
data = stats.mielke.rvs(k_true, s_true, size=400, random_state=rng)
logx = np.log(data)
sum_logx = logx.sum()
n = data.size
k_grid = np.linspace(0.4, 6.0, 90)
s_grid = np.linspace(1.1, 6.0, 90) # avoid extremely heavy tails for this demo
log_post = np.empty((k_grid.size, s_grid.size), dtype=float)
for j, s in enumerate(s_grid):
# sum_i log(1 + x_i^s) computed stably
s_term = np.logaddexp(0.0, s * logx).sum()
# vectorized log-likelihood over k_grid
loglike = n * np.log(k_grid) + (k_grid - 1.0) * sum_logx - (1.0 + k_grid / s) * s_term
# log-uniform prior over the grid bounds
logprior = -np.log(k_grid) - np.log(s)
log_post[:, j] = loglike + logprior
# Normalize on the discrete grid (treating cells as equal-area for visualization)
log_post -= logsumexp(log_post)
post = np.exp(log_post)
i_map, j_map = np.unravel_index(np.argmax(post), post.shape)
print("True (k,s):", (k_true, s_true))
print("MAP (k,s):", (float(k_grid[i_map]), float(s_grid[j_map])))
fig_post = go.Figure(
data=go.Contour(
x=s_grid,
y=k_grid,
z=post,
contours_coloring="heatmap",
colorbar_title="posterior",
)
)
fig_post.update_layout(title="Grid posterior p(k,s | data) with log-uniform prior")
fig_post.update_xaxes(title="s")
fig_post.update_yaxes(title="k")
fig_post.show()
True (k,s): (2.5, 3.0)
MAP (k,s): (2.4134831460674153, 3.0820224719101126)
11) Pitfalls#
Invalid parameters: require \(k>0\) and \(s>0\) (and for SciPy location/scale,
scale>0).Moment non-existence: mean requires \(s>1\), variance requires \(s>2\), etc. If \(s\le 1\), sample means are unstable and can be misleading.
Near-zero behavior: if \(k<1\), the density diverges at 0; this is not a bug.
Numerical issues: direct computation of \(x^s\) can overflow for large \(x\) and large \(s\). Prefer log-space identities like
log(1+x^s)=logaddexp(0, s log x).Fitting can be delicate: heavy tails can produce extreme outliers; MLE may be sensitive to initialization and may have strong parameter correlation.
12) Summary#
mielkeis a continuous distribution on \(x>0\) with power-law tails.PDF: \(f(x)=\dfrac{k x^{k-1}}{(1+x^s)^{1+k/s}}\); CDF: \(F(x)=(1+x^{-s})^{-k/s}\).
Tail index is \(s\): positive moments exist iff \(n<s\).
Closed-form raw moments and entropy are available via Gamma/digamma functions.
Sampling is easy via the closed-form quantile function (inverse CDF).
In SciPy:
scipy.stats.mielke(equivalentlyscipy.stats.burr(c=s, d=k/s)).